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Visualization and Control Techniques 
for Multimedia Digital Content 

FIELD AND BACKGROUND OF THE INVENTION 

The present invention generally relates to the field of fast 
multimedia browsing. It particularly refers to different im- 
plementations of a conceptual framework which defines basic 
mechanisms for efficiently previewing multimedia content. 

Today, the availability of digital content archives compris- 
ing huge amounts of digital multimedia data requires effi- 
cient browsing mechanisms for extracting relevant informa- 
tion. To avoid information overload, a browsing system needs 
to preselect shots of information from a database in a user- 
adequate manner. Additionally, such a browsing system should 
be able to support continuous presentation of time -dependent 
media. Users of browsing applications often have vague infor- 
mation needs which can only be described in conceptual terms. 
Additionally, a general browsing system must offer mechanisms 
for interactive inspection of the information following a 
user' s instructions . 



The use of a physical metaphor is a common strategy for de- 
signing a user interface system. For example, as described in 
„ Multimodal Video Indexing: A Review of the State-of - the -Art « 
(Technical Report 2001-20, Intelligent Sensory Information 
Systems Group, University of Amsterdam)- by C.G.M. Snoek, M. 
Worring, in conventional video indexing systems according to 
the state of the art the technique of „book indexing" is ap- 
plied to the field of video browsing. 

Analysis of human behavior shows that many people normally 
leaf quickly through the pages of an unknown magazine before 
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buying this magazine at the news kiosk. Thereby, a combina- 
tion of manual and mental processes is used to quickly browse 
the information contained in the magazine until reaching a 
certain level of understanding. Such a process can be iter- 
5 ated several times for reaching higher and higher levels of 
comprehension of the magazine's content. The „ reader browsing 
speed" is dynamically adapted to the level of his/her inter- 
ests. If the reader is interested in the content of a page, 
he/she can scroll around with fine-grained steps. Alterna- 
10 tively, if he/she is not interested in this content, he/she 
can jump forward or backward towards distant pages in a non- 
linear way, while picking up some information from the inter- 
mediate pages. 

15 BRIEF DESCRIPTION OF THE PRESENT STATE OF THE ART 

In order to understand the central idea of the invention, it 
is necessary to briefly explain some of the most important 
features of conventional video summarization, audio browsing 
20 and e-book systems according to the state of the art. 

US 5,708,767 describes a method and an apparatus for video 
browsing based on content and structure. Therein, anew 
browsing technique for extracting a hierarchical decomposi- 
25 tion of a complex video selection is proposed, which combines 
visual and temporal information to capture the most important 
relations within a scene and between different scenes in a 
video, thus allowing an analysis of the underlying story 
structure without having a priori knowledge of the content. 

30 

In 5,995,095, a method for hierarchical summarization and 
browsing of digital video is disclosed which comprises the 
steps of inputting a digital video signal for a digital video 
sequence and generating a hierarchical summary that is based 
35 on keyframes of said video sequence . 
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An automatic video summarization technique using a measure of 
shot importance as well as a frame -packing method are de- 
scribed in US 6,535,63 9. 

In WO 00/39707 a personalized video classification and re- 
trieval system is disclosed that allows users to quickly and 
easily select and receive stories of interest from a video 
stream. 

A video -on -demand (VoD) system as well as a corresponding 
method for performing variable speed scanning or browsing are 
described in EP 0 676 878 Al . 

US 2002/0051010 refers to a system for searching and browsing 
multimedia and, more particularly, to a video skimming method 
and apparatus which is capable of summarizing the full con- 
tent of video files within a short period of time by skimming 
the content of a video file and rapidly moving to a desired 
section. 

EP 1 2 05 898 A2 pertains to computer- implemented techniques 
for improving reading proficiency. Thereby, a segment of text 
is displayed on a video screen. 

GB 2 322 225 is directed to a continuous search method and 
apparatus for searching among discont inuously recorded sheets 
of photo information, which are recorded in a digital record- 
ing medium (e.g. a digital video, cassette recorder).. 

US 5,847,703 refers to a method and apparatus for browsing 
through a motion picture in order to locate desired segments 
in said motion picture. 
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A method carried out in an image processing system for se- 
lecting text and image data from video images is disclosed in 
US 6,178,270. 

5 PROBLEMS TO BE SOLVED BY THE INVENTION 

Today, the increasing amount of digital multimedia content 
(video, audio, and text data from movies, Web pages, e-books, 
audio and video files, etc.) is opening up a vast range of 

10 problems and challenges related to the consumption of multi- 
media content. One of the major problems is how to quickly 
browse through digital multimedia content for getting an im- 
pression or a digest of the contained information in a short 
time since browsing and making a digest of digital multimedia 

15 content is generally very time-consuming. Up to now, however, 
most of the presently available automatic or semi-automatic 
video summary systems feature many limitations: 

— poor computer-based user interactions and use of non-intui- 
20 tive GUI paradigms, 

— high computational complexity (especially when complex ex- 
traction algorithms are applied, which are often restricted 
to a specific content type) , 

— focus on either preview or digest, 
25 — too long preview time, and 

— poor modeling of user preferences.. 

OBJECT OF THE PRESENT INVENTION 

30 In view of the explanations mentioned above, it is the object 
of the invention to provide users with a„ visual/manual frame- 
work for enabling an efficient and intuitive way of preview- 
ing digital multimedia content. 
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This object is achieved by means of the features of the inde 
pendent claims. Advantageous features are defined in the sub 
ordinate claims. Further objects and advantages of the inven 
tion are apparent in the detailed description which follows. 

SUMMARY OF THE INVENTION 

The proposed solution is dedicated to different implementa- 
tions of a multimedia preview system for quickly and interac- 
tively browsing digital multimedia content (video, audio, 
and/or text data) with the aid of a user-driven, speed-de- 
pendent browsing process. In this connection, a conceptual 
framework is introduced which defines basic mechanisms for a 
^leafing through" of multimedia content (e.g. the content of 
an electronic book, a digital video or a digital audio file) . 
Said system thereby provides a user interaction model emulat- 
ing the intuitive manual and mental process of leafing 
through the pages of a book or an illustrated magazine. The 
system thereby provides special navigation patterns either 
for previewing digital multimedia content or having a quick 
digest . 

This includes a dynamic presentation scheme of the digital 
content depending on the speed of browsing and different se- 
mantic levels, an effective and intuitive manual user inter- 
action pattern and a non- linear navigation pattern for emu- 
lating coarse-grained page leafing. In this context, the ex- 
pression „ semantic level" is used to identify possible ab- 
stract representations of the same digital content carrying 
different amounts of information. For example, a simple text 
can be represented with different semantic levels: title, 
captions, key words, etc., all carrying different parts of 
the entire information. In case of video data, the herein 
proposed solution provides a more neutral and personal way of 
browsing and trades off the complexity of conventional video 
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extraction algorithms according to the state of the art by 
providing a higher level of user control . 

According to the invention, a „ semantic zoom" functionality, 
5 the usage of multiple information modes and natural user in- 
teraction are combined. Semantic zoom on content is defined 
as the possibility of visually providing information with 
different degrees of importance for the full understanding of 
the content itself. While information contained in a video or 
10 a book is normally provided by using redundant media, e.g. 
text, pictures, moving pictures, etc., semantic zoom is a 
technique which depends on the required degree of. details 
representing the minimum non-redundant subset of information 
in a large information container. 

15 

Said multimedia preview system can be realized as a video-on- 
demand system with an additional video browsing functionality 
for varying the speed and detail level of presentation de- 
pending on type and/or frequency of user commands instructing 
20 the multimedia preview system change the speed of browsing 

such that said detail level is the higher the lower the speed 
of presentation and vice versa. 

One possible implementation of the proposed multimedia pre- 
25 view system works by decomposing the content of e.g. a digi- 
tal video or audio file, assigning segmented parts of said 
content to different speeds of browsing representing differ- 
ent levels of detail and controlling the speed of browsing. 
However, it is important to note that - in contrast to WO 
30 00/39707 and US 2002/0051010 - the proposed multimedia pre- 
view system according to the present invention does not de- 
pend on semantic analysis tools such as shot and scene detec- 
tion means. Instead, said file is temporally compressed by 
cutting out several parts of a video sequence, and the speed 
35 of scrolling the visualized information can be controlled by 
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a user. Thereby, said user has the possibility to interac- 
tively control two parameters at real time: the playback 
speed and the size of the video sequence that has to be tem- 
porally compressed. The intention of the proposed system is 
5 to give the user an impression of the content without trying 
to „hide" uninteresting parts such that the user can decide 
by himself whether said content is interesting or not. 

In contrast to EP 0 676 898 Al , the data segments played back 
10 by the proposed multimedia leafing system are much larger 

than typical MPEG-2 GOFs . Moreover, the segmentation is not 
based on the kind of compression technique used in MPEG. It 
should further be noted that the segments itself are played 
in another speed than the normal playback rate. 

15 

The fundamental approach of the present invention is not to 
define any particular assumption regarding user preferences 
or content type but to provide a multimedia preview system 
that enhances a user's capability of browsing digital con- 
20 tent, fully controlled by the user itself. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Further advantages and embodiments of the present invention 
25 result from the subordinate claims as well as from the fol- 
lowing detailed description of the invention as depicted in 
the accompanying drawings : 

Figs, la-c show different types and speeds of leafing 
through an illustrated magazine, 

Fig. Id is a flow chart illustrating an algorithm which 

approximates the mental process of a person 
while leafing through an illustrated magazine, 
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Fig. 2 illustrates the process of multimedia decomposi- 

tion, 

Fig. 3 shows a schematic example of a spatial layout 

having different detail levels of presentation 
for the content of a video sequence to be pre- 
viewed in text and/or image, 

Fig. 4 is a program sequence showing an XML-based rep- 

resentation of metadata which is used for brows- 
ing the content of multimedia data to be pre- 
viewed, 

Fig. 5 is a timing diagram of a virtually structured 

movie , 

Fig. 6a shows three input and navigation control devices 

which can be used as human-machine interfaces 
for previewing a video sequence, 

Fig. 6b shows an example of browsing through a video se- 

quence by using the „ semantic zoom" function of- 
fered by a video preview system according to one 
embodiment of the present invention, 

Figs. 7a-d are four diagrams illustrating the dynamic pres- 
entation layout during a browsing process, 

Fig. 8a shows a bendable PDA which can be used as . a hu- 

man-machine interface for remotely controlling a 
video preview system,. 

Fig. 8b shows a credit-card sized display device with an 

integrated navigation system, , 
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Figs. 8c-e show different detail levels of a virtual map 

displayed on an integrated display of the bend- 
able PDA or credit-card sized display device, 
respectively, 

*. 

Fig. 8f shows the rear side of the credit-card sized 

display device with an integrated touch pad, 

Fig. 9 shows a DVD cover with integrated display capa- 

bilities which can be used as a human-machine 
interface for remotely controlling a video pre- 
view system, 

Fig. 10 shows a multimedia preview system in a client/ 

server-based network environment for browsing 
the content of requested multimedia data to be 
previewed, 

Figs, lla-c are different layouts of the same electronic pa- 
per (e.g. a Web page), 

Figs. 12a-c show three schematic layouts of an electronic 
paper consisting of different frames for dis- 
playing pictures, titles and body text, 

Fig. 13 shows seven schematic layouts of a page from an 

electronic paper during a fast browsing, 

Fig. 14 shows different examples of .input and control 

devices which can be used for creating „ leafing 
events u needed for browsing the content of an 
electronic paper, 



Fig. 15 is a schematic diagram showing examples of a 

user's input actions for navigating through the 
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pages of an electronic book, 

Fig. 16 shows a three-dimensional schematic view of an 

electronic book device having a touch- sensitive 
surface for inputting control information needed 
to increase or decrease the speed of browsing 
and/or the detail level of presentation to be 
displayed through the pages of the electronic 
book, 

Fig- 17 shows a graphical user interface of a client 

terminal running an application program for con- 
trolling an audio player downloaded from an ap- 
plication server in a client/server-based net- 
work environment, 

Fig. 18a is a diagram illustrating a speed- dependent spa- 

tial-temporal-semantic information layout, and 

Fig. 18b is a diagram showing a non- linear ^long' dis- 

placement action with a dynamic spatial- tempo- 
ral-semantic information layout 

DETAILED DESCRIPTION OF THE PRESENT INVENTION 

In the following, embodiments of the present invention as de- 
5 ' picted in Figs, la to 18b shall be explained in detail. The 
meaning of all the symbols designated with reference numerals 
and signs in Figs, la to 18b can be. taken from an annexed ta- 
ble. 

10 Figs, la-c illustrate both the mental and manual process of a 
reader when leafing through the pages of an illustrated maga- 
zine. Thereby, said reader focuses on different parts of a 
page, mentally filters the information displayed and, depend- 
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ing on his/her particular degree of interest, browses the 
content of the magazine in a quicker or slower way, respec- 
tively. 

A flow chart which models this leafing process is depicted in 
Fig. Id. Thereby, said reader spatially focuses on different 
elements of a page (picture, titles, subtitles, etc.) and 
navigates pseudo- randomly following his/her interests. In 
such a way, said reader can browse a large quantity of infor- 
mation in a quick and easy way. 

The process of mentally filtering is also driven by different 
kinds of content and structure of the displayed pages them- 
selves: For example, pictures are faster to interpret than 
textual information, titles and subtitles are more easier to 
read than body text, and so on. Thereby, said process can be 
supported by using the fingers for reading a magazine line by 
line. However, navigating through the pages of a magazine is 
more sophisticated, which means that the navigation process 
does not proceed with simple patterns (such as e.g. patterns 
for controlling a video cassette recorder) . By observation, 
different types of navigation patterns in leafing through an 
illustrated magazine can be identified: scrolling slow, 
scrolling fast, skipping to different pages, backward and 
forward acceleration, etc. There is not a fixed maximum time 
or any fixed spatial layout to go through the information, 
but the user itself gives pace to the navigation procedure by 
following pseudo-random paths. 

Such a process can be generalized and applied to different 
digital content. The focus here on „digital" stresses the 
fact that digital data can easily be manipulated for differ- 
ent purposes by means of automatic algorithms. 

A digital video can easily be separated in different simpler 
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composing information channels: video scenes can be repre- 
sented by picture frames, audio can be classified in several 
types (speech, music, silence), etc. Additionally information 
can be transformed by using alternative surrogates depending 
5 the level of „ interests/details * or the speed of browsing. 

A first embodiment of the present invention refers to the use 
of said leafing model in a video preview system. For this 
purpose, it is necessary to divide a video stream to be pre- 

10 viewed into its elementary objects (frames, text, and sound 

samples) , thereby providing different abstraction/detail lev- 
els of presentation. („ semantic degrees") . Said objects are 
then extracted on a time-base scale. Initially, no effort is 
spent to extract semantically meaningful objects (e.g. scene/ 

15 shot boundaries, see Pig. 2) . For example, main frames can be 
extracted every 2-3 minutes and other frames every 15-2 0 sec- 
onds. These two groups represent two semantic zoom levels for 
the visual information to be displayed, because they give the 
user a partial semantic „ window" on the content itself. Dia- 

20 log text and scene commentary, which represent two further 
semantic levels, are synchronized with the visual content. 

Thus, different information representations are grouped into 
three semantic spatial layouts: coarse-grained (keyframes and 
25 text summary) , middle-coarse (other frames and dialog text) , 
and fine-grained layouts (sound activated) . 

The user can at any time select full video playback. During 
the quick browsing the information is dynamically displayed 

30 with a variable semantics layout depending on the user's 

browsing speed. For example, in a long- jump transition the 
user is shown only a few keyframes in a slide show mode and 
only summary keywords. Combinations of browsing speeds and 
information layouts are used, especially in case of non-lin- 

35 ear backward/ forward jumps through a video sequence. 
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Fig. 2 illustrates an example of a multimedia decomposition 
process according to the present invention, which is applied 
to the content of a video sequence presented by a multimedia 
preview system. Said preview system features a „ multimedia 
semantic zoom" function according to the present invention 
offering different speeds and detail levels of presentation 
in text and/or image which can be selected by a user to 
quicker or slower browse the content of said video sequence, 
wherein the detail level is the higher the lower the speed of 
presentation and vice versa. 

The „ semantic zoom" function provides a user with different 
views of requested multimedia information {videos, electronic 
books, audio files) depending on the needed degree of de- 
tails. For example, a magazine as a container of information 
can be represented with subsets of the contained information 
depending the degree of details the reader wants to have. A 
video can be represented in time, space, quality, etc. with 
different alternative subsets of the information it contains. 
The semantic zoom ranges from the full content (e.g. playback 
of a video, reading a book page by page, etc.) to a variable 
lower detail level of presentation providing simplified sur- 
rogates of information (pictures, text, audio samples, etc.). 
The final goal is to give to the user the capabilities to as- 
sociate the degree of required details and the speed of navi- 
gation. 

One embodiment of the present invention pertains to a method 
for browsing the content of requested multimedia data to be 
previewed, said content being displayed on a client terminal 
1006 accessing a multimedia server 1002 which holds said mul- 
timedia data. After having downloaded (SO) said multimedia 
data from the multimedia server 1002 to said client terminal 
1006 via a network link, said multimedia server 1002 receives 
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(Sla) and processes (Sib) user commands demanding a change in 
the speed of browsing and/or in the abstraction level of 
presentation, in the following referred to as „ representation 
parameters". After that, said multimedia data are decomposed 
5 (S2 ) into non-redundant and redundant, less relevant parts 
according to an offline image and/or text segmentation algo- 
rithm. These representation parameters are then adapted (S3) 
by online filtering out (S3') a certain amount of said redun- 
dant, less relevant parts depending on type and/or frequency 
10 of said user commands such that the degree of presented de- 
tails is the higher the lower the speed of presentation and 
vice versa. Finally, an adapted version of said multimedia 
data is displayed (S4) on said client terminal 1006. 

15 Advantageously, metadata of any kind allowing users to iden- 
tify segmented parts of multimedia data to be previewed are 
associated (S5a) to said multimedia data. Thereby, said 
metadata have to be synchronized (S5b) with said multimedia 
data . 

20 

The proposed ,, leafing model" is completely different from 
conventional indexing methods. While books, due to their tex- 
tual nature, are easily browsed through summary and indexes - 
which is the base of conventional video summarization tech- 
25 niques according to the state of the art the multimedia 
nature of video content imposes big limitations to such a 
technique . 

For implementing a system similar to the classical „ leafing 
30 model" for a video content, the video information has to be 
separated in its elementary components. In the following, 
this decomposition and semantic zoom approach is described in 
detail as an introduction to the proposed implementation. 

35 Different keyframes are used to quickly browse visual con- 
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tent: While several systems have been introduced for keyframe 
detection/selection, the proposed system uses a combination 
of user interactions and constant -interval selection. The 
system described here selects different keyframes at constant 
5 temporal distance for easing the selection process and uses a 
user's capabilities to grow through a large amount of infor- 
mation. 

The nature of video artifacts implies that into a single in- 
10 terval of duration D (e.g. 2 minutes), every frame can sta- 
tistically be used as a representative frame due to the fact 
that the video information does not change abruptly. In 
mathematical terms this can be described as follows: 

15 „In a slowly changing function y = f (x) , a value, f (x 0 ) can be 
used as an approximation of f (x) in an interval [x, x+D] , 
wherein D is chosen small enough. w 

The loss of information in quickly changing video scenes .is 
20 neutralized by the usage of several redundant information 
modes (text, colors, etc.). 

Other new or traditional techniques can also be applied for 
keyframe detection, thereby taking into account an increase 
25 in processing time, which can harm the possibility of pro- 
ceeding at real time. The time interval D can vary depending 
on the speed and frequency of user commands . 

According to a further embodiment of the invention, audio 
30 components of a video film are reduced to audio samples (e.g. 
very short samples of a soundtrack, speech samples, etc.), 
and speech components of said video film are provided using 
textual surrogates on simple semantic levels, said surrogates 
being an equivalent to titles and subtitles while reading a 
35 magazine. The user, in some cases, is interested to a very 
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general overview of the information currently browsed such 
that it is sufficient to present information on a higher 
level of abstraction by neglecting details. 

5 Different levels of textual information can be provided de- 
pending on the complexity of the video content: For example, 
in a reportage or university lesson video only the textual 
speech of the speaker can be provided, while in a movie a 
general summary of video intervals or going to a much more 
10 fine-grained levels with dialogs or sound samples can be pro- 
vided . 

In the visual and textual domain different degree of semantic 
zooms can be provided, e.g. by using key words or different 
15 picture qualities. 

Once having separated all the main components, the informa- 
tion is structured in order to help the user to focus on dif- 
ferent degree of semantics. Then, a spatial layout showing 

20 different detail levels of presentation for the content of a 
video sequence to be previewed as depicted in Fig. 3 (a so- 
called „ static layout' 1 , because it is presented to the user 
while no interaction is performed) can be defined. The spa- 
tial collocation of the information helps the user to focus 

25 on different levels of details. 

As shown in the example depicted in Fig . 3 , two main areas 
can be defined: a coarse-grained one 306a~e and a fine- 
grained one 308a-h. The first area consists of a collection 

30 of N keyframes 3 0 6a-e, wherein an interval T of approximately 
two minutes represents a large video interval . textual areas 
310 and 312 are added where e.g. a very general textual sum- 
mary of the corresponding keyframes is displayed. In this 
case an alternating „ semantic zoom" is reproduced between the 

35 visual information and the textual one. The approximation of 
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such information is reinforced by the redundancy introduced 
by text and images. In the keyframe area 306a-e a perspective 
technique is used to focus the main attention to a single 
frame. At the same time the user has a temporal view on a 
video interval having a total length of N- T. 

The fine-grained area 308a-h features the same pattern but 
with a smaller sample time interval (e.g. 30 seconds), and it 
refers only to the time interval of the main keyframe (see 
Fig. 3) . The textual information has a more detailed view on 
the information: For a movie this can be associated to dialog 
text 310, for a reportage to a speaker's voice transcription, 
etc . 

The space allocated to each area should be proportional to 
the carried degree of semantics : In the proposed system more 
than 50% of spatial space is given to the highest semantic 
level, that is the coarse-grained visual area 3 0 6a-e (see 
Fig. 3) . Other values can be used or manually set by the 
user. Such distribution of the spatial layout can be done 
automatically depending the user navigation patterns (fast, 
slow, etc . ) . 

Other information can be shown by using alternative surro- 
gates: For example, audio dynamics can be shown by using col- 
ors on a navigation bar. The sound of a movie is present in 
such a layout, although it is not spatially displayable: If 
the user maintains the focus on the current displayed layout 
for a time At (e.g. 10 seconds) , a soundtrack is offered that 
characterizes the current displayed video time interval. 
Thereby, a user can automatically or manually activate the 
video playback that represents the full video information 
(maximum semantic level) . 

The dynamic behavior of the system described in this patent 
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application is of fundamental importance. Starting from the 
dynamic analysis of „magazine leafing", the importance shall 
now be emphasized to have a more intuitive navigation pat- 
tern. Such a navigation pattern should map to the fuzzy be- 
5 havior and capabilities of leafing a magazine with ^digital 
fingers" . 

Fig. 6a shows different input and navigation control devices 
602a-c which can be used as human -machine interfaces for pre- 

10 viewing a video sequence , said devices including a three-key 
touch pad display 602a serving as a human-machine interface 
for navigating through a list of symbols for playing, 
slow/fast scrolling or skimming a video sequence, a rolling 
mouse 6 02b serving as a human-machine interface for perform - 

15 ing the aforementioned navigating actions and a remote con- 
trol device 602c having control keys for executing functions 
of a video cassette recorder (e.g. playing, fast forwarding 
and fast rewinding a video sequence, pausing or stopping a 
video playback, etc.) . Fig. 6a further shows a variety of 

20 navigation actions executable by said devices which are used 
to control the speed and acceleration of browsing. Thereby, 
speeding up and accelerating have a corresponding result on 
the envisioned information. 

25 In this connection, each action is associated to a specific 
„ semantic level", which means the information displayed can 
be prioritized depending on the respective abstraction level 
of presentation, e.g. such that the degree of presented de- 
tails is the higher the lower the speed of presentation and 

30 vice versa. Whereas text data can easily and quickly be visu- 
alized and read by the user, audio data are not easily repro- 
ducible during a quick browsing. For example, during a fast 
browsing all the information displayed in the „ static layout" 
are reduced to maintain a higher abstraction level of presen- 

35 tat ion, which means that a presentation layout showing only a 
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few details is displayed. Figs. 7a-d show four diagrams il- 
lustrating the dynamic presentation layout during a browsing 
process, wherein the speed and detail level of presentation 
in text and/or image is varied depending on user commands in- 
structing the video preview system to quicker or slower 
browse the content of said video sequence. For example, the 
static layout can be simplified for focusing on more coarse- 
grained details. 

It is also part of the dynamic behavior of the proposed mul- 
timedia preview system to dynamically change the spatial, 
temporal and semantic parameters of the system. For example, 
textual information is changed to keywords. Thereby, a user 
can zoom semantically over a text area both on physical di- 
mension (bigger fonts) and semantic dimension (keywords) . 
Moreover, the time intervals used for selecting the picture , 
frames can vary following the speed and/or frequency of user 
commands . 

A user input device which is deployed for entering a user's 
navigation commands for remotely controlling a multimedia 
preview system should be able to intuitively map entered com- 
mands to programmed navigation actions to be executed by said 
preview system (e.g. actions changing the speed of browsing) . 
For example, a touch pad display 602a as depicted in Fig. 6a 
can be used for navigating through a list of symbols for 
playing, slow/fast scrolling or skimming a video sequence. 
This technology can be used in addition to all kinds of de- 
vices capable of enabling a „leafing" action in an intuitive 
way, especially bendable user interfaces allowing continuous 
input actions by using e.g. the fingers. Bendable user inter- 
faces have been realized (see Figs. 8a-f) . Fig. 8a shows that 
slightly bending such a bendable PDA 80 0a calls a programmed 
function of the preview system (e.g. to zoom in or out a vir- 
tual map 800c-e displayed on an integrated display 8 02a) . An- 
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other prototype is realized as a credit-card sized display 
device 800b (a so-called „tourist browser") with an inte- 
grated navigation system as depicted in Fig. 8b. This device 
comprises a variety of piezoelectric sensors detecting the 
5 card being bent up or down. 

The system described above can also be used e.g. for promot- 
ing video digital content. This can be achieved by e.g. im- 
plementing the navigation mode described above on Web sites. 
10 Another approach is to use DVD covers with integrated display 
capabilities as human-machine interfaces for remotely con- 
trolling a video preview system according to the present in- 
vention (see Fig. 9) . 

15 All the extra information needed to run a system as described 
above can automatically be retrieved from e.g. movie tran- 
scripts, director storyboards, etc. or extracted from e.g. a 
video source (e.g. by performing a speech- to- text extraction, 
etc.) . Such a set of extra data can be embedded in the movie 

20 or delivered separately. In the scope of the present inven- 
tion a movie transcript is used as a data source for addi- 
tional textual information, because they are easily available 
and can easily be processed. 

25 According to one embodiment of the invention, all the extra 
information is embedded in an XML file, separate from the 
video content. In this case, it is more useful for backward 
compatibility with old video content and it is more suitable 
for network delivery. A program sequence showing an XML-based 

30 representation of metadata which is used by the multimedia 

preview system according to the present invention for brows- 
ing the content of multimedia data to be previewed is shown 
in Fig. 4. In this way, information is introduced that can 
easily be extracted by already existent sources (e.g. dia- 

35 logs, summary, soundtrack, etc.). This information is then 
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structured with the aid of an XML tree. Thereby, said meta- 
data are used to support the navigation and the semantic zoom 
process . 

Applying the above -described techniques results in a video 
„virtual u structure (see Fig. 5) with different detail levels 
of presentation. As shown in Fig, 5, video content can be 
structured in a way to facilitate the previewing procedure. 
The multimedia preview system therefore provides a quick 
overview within a time span of e.g. 8 to 10 minutes where in- 
formation can be shown using different ^multimedia modes". It 
is shown that a virtually structured movie (consisting of 
video frames, summary, dialogs, etc.) can be previewed only 
within a time window of a predefined size. Thereby, only in- 
formation contained in this window is presented to the user, 
e.g. a single page of an electronic book. 

The proposed multimedia preview system for browsing the con- 
tent of requested multimedia data to be previewed works ac- 
cording to a straightforward approach using two separate data 
sources: one is the video itself, and the other is a general 
data container with all the information needed for the navi- 
gation mode: keyframes, textual information (e.g. dialogs, 
video summaries), additional sound samples, etc., which are 
formatted e.g. by using XML files (see Fig. 4) . 

Thereby, a client application downloads and uses the latter 
data container for executing the major navigation steps. 
Whenever a user decides to playback a movie, an Internet 
video server is requested to deliver the video stream. This 
solution is fully back-compatible with old movies that do not 
contain any extra information and with movies having differ- 
ent formats. As shown in Fig. 5, metadata needed for said 
navigation actions can be transmitted separately or embedded 
in an original multimedia format: Once a user is remotely ac- 
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cessing and leafing requested multimedia content, he/she can 
access the full information by using a traditional multimedia 
Web server. 

5 Electronic book, electronic ink, electronic paper as well as 
Web page devices are all examples for digital substitutes of 
traditional literature such as books , ' magazines , and newspa- 
pers, etc. The digital format of these hand-held computer de- 
vices for displaying electronic literature increases the 
10 flexibility and the overall potentialities of a traditional 

text preview system: Embedding multimedia objects, screen ad- 
aptation and hyper linking are just a few examples of what 
can be done when using this digital format. 

15 The physical realization of electronic literature has the ma- 
jor drawback of limiting the traditional way of interacting 
with the paper media: Traditional books can be leafed, ma- 
nipulated, bent, etc. In the near future many kinds of tradi- 
tional paper content will also be provided in the form of 

20 electronic literature (e.g. comics, art catalogs, language 

courses, etc.), and a flexible way to manipulate them will be 
needed . 

There are no solutions up to now which try to recreate intui- 
25 tive actions for previewing „paper" content in a digital e- 
book. For example, it is not possible to „leaf u through the 
pages of an electronic document . 

Therefore, one embodiment of the present invention particu- 
30 larly refers to a previewing system for hand-held computer 

devices (e.g. e-book devices) which can be used for display- 
ing electronic documents. The invention further pertains to 
input add-ons to an e-book device, which can be used for rec- 
reating the experience of leafing through the pages of a 
35 book. Said e-book device comprises a touch-sensitive surface 
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which can be used for virtually navigating through the digi- 
tal pages (see Figs, 14a-c) , e.g. by touching various ele- 
ments of an integrated linear-scaled touch-sensitive stripe 
1404a (see diagram 14 00a) , shifting and/or turning a cylin- 
drical control device 1404b (see diagram 14 00b) or moving a 
finger over a nonlinear- scaled touch- sensitive stripe 1404c, 
wherein the finger position on said stripe corresponds to a 
selected speed of browsing and/or the displayed detail level 
of presentation (see diagram 1400c) . What is offered by this 
preview system is a dynamic layout of an original electronic 
content (e.g. pages from an electronic magazine or from a re- 
quested Web site) that is automatically adapted to a user's 
navigation actions. 

A three-dimensional schematic view of an electronic book de- 
vice 1600 having a touch- sensitive display 1502 for inputting 
control information needed to increase or decrease the speed 
of browsing and/or the detail level of presentation to be 
displayed through the pages of the electronic book is shown 
in Fig. 16. 

One embodiment of the invention pertains to the use of spe- 
cific interaction patterns in the scope of the proposed leaf- 
ing model . Thereby, input devices as described above are able 
to detect different kinds of user interactions, e.g. user, 
commands trying to change the speed and/or direction of 
browsing („ speed- dependent scrolling") , to change the ab- 
straction level of presentation (which means to modify the 
spatial, temporal and/or semantic layout) or to randomly ac- 
cess pages of an electronic document (non-linear long dis- 
placements, see description below) . 

Fig. 15 shows different examples of a user's input actions 
for navigating through the pages of an electronic book by 
moving a finger across a touch- sensitive display 1502 of an 
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electronic book device 1600 to control the speed of browsing 
and/or the detail level of presentation to be displayed. Be- 
sides so-called 

5 - „ displacement events u (rotational or translational move- 
ments of a user's finger in any direction across the touch- 
sensitive display 1502 for navigating through the pages of 
the electronic book 1600, the length of the movement path 
being directly proportional to the speed of browsing and/or 
10 the detail level of presentation to be displayed) so-called 

— „ stroke events" (forces or the duration of forces exerted 
by a user's finger to the surface of the touch- sensitive 
display 1502 to navigate through the pages of the elec- 
15 tronic book 160 0, said force being directly proportional to 

the speed of browsing and/or the detail level of presenta- 
tion to be displayed) 

addressed by the present invention can be used as input ac- 
20 tions to change the detail level and/or speed of browsing. 
The user should be able to leaf through the pages of the e- 
book as through the pages of a traditional book; in this case 
the book can be browsed in a pseudo-random way, and the fin- 
gers of the user can give the pace of browsing. Speed-depend - 
25 ent, fuzzier, more intuitive interaction patterns are also 
conceivable . 

The e-book system reacts to these actions by simulating the 
human process of „ leafing" through the pages of a book, maga- 

30 zine or newspaper, and displaying the content by using dif- 
ferent semantic degrees. These semantic degrees are generated 
by decomposing the original digital content into multiple in- 
formation modes or channels (text, pictures, etc.). Addition- 
ally, any mode can be displayed on a different abstraction 

35 level of presentation or with a different spatial focus (e.g. 
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key words for textual information, coarse-grained pictures, 
etc . ) . 

Three different possibilities of implementing the proposed 
video leafing model are shown in Figs. 18a+b. 

The diagrams depicted in Fig. 18a illustrate how the detail 
level of presentation V for the spatial and temporal layout 
of a displayed video sequence can be modified dependent on 
user commands demanding a change in the speed of browsing. As 
can be seen, V depends on the resolution and size of dis- 
played keyframe images 306a-e showing shots from a skimmed 
video sequence and with the duration D of display, respec- 
tively. 

A diagram 1800b showing a non-linear 'long' displacement ac- 
tion with a dynamic spatial-temporal-semantic information 
layout is shown in Fig. 18b. Thereby, the detail level of 
presentation V is a function of time t; it increases or de- 
creases with the resolution and size of displayed keyframe 
images 3 06a-e. 

The main limitations of the proposed multimedia leafing ap- 
proach are to find the right spatial- temporal layout of the 
multimedia information to be displayed and the semantic de- 
composition of complex multimedia content. 

In Figs, lla-c, 12a-c and 13a-g some possible examples for a 
semantic reconfiguration of the layout of textual or picto- 
rial information during a „ leafing" navigation are shown. 

An electronic paper page can be displayed with different lay- 
outs depending on the speed of browsing. Figs, lla-c show 
three different layouts of the same electronic paper (e.g. a 
Web page) , wherein the semantic focus is set proportional to 
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the navigation speed of a user skimming the content of said 
document while surfing through the Internet. Thereby, each 
element on a displayed Web page is weighted for its semantic 
value, which can be calculated with empirical association to 
5 the different typographic parts of the page or can explicitly 
be added during the editing phase by the authors- Increasing 
browsing speed reduces the number of elements and filters 
them proportionally to their semantic weight. Ideally, each 
displayed element is able to change its physical appearance 
10 (e.g. dimensions, colors) and its abstraction level of pres- 
entation. 

As shown in Figs. 12a-c, the same concept can be generalized 
by using different layouts: In this case, we have a page that 
15 contains three different elements (pictures, title, and body 
text) . 

Fig. 13 shows seven schematic layouts of a page from an elec- 
tronic paper during a fast browsing, wherein the semantic f o- 

20 cus is set proportional to the navigation speed of a user 
leafing the electronic paper. In the first three pages 
(13 0 0a-c) the main layout is maintained; only very important 
semantic objects are left (pictures, key words, etc.). In the 
second three pages (1300d-f) the objects itself are spatially 

25 emphasized. The seventh page (13 00g) shows a completely new 

layout which is formed by composing N pages containing seman- 
tically important objects. 

For implementing the above system any kind -of electronic pa- 
30 per format can be used. Information decomposition can be done 
automatically following some general heuristic rules or pre- 
defined formatting tags (e.g. in HTML <hlx/hl>, <h2x/h2>) . 
Therefore, special tags in the electronic format are intro- 
duced for including explicitly different semantic degrees. 
35 This additional information, which is transparently disabled 
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during the normal reading speed, can be used to properly 
change the layout of displayed documents during fast naviga- 
tion. 

According to a further embodiment of the present invention, 
digital audio content can be „pre-heard" by using the same 
navigation patterns described in the above sections. Nowa- 
days, audio „preview" is not done because the nature of audio 
data does not allow doing fast navigation. Some techniques 
have been introduced to fast browsing audio by reducing si- 
lent intervals. However, techniques that apply only pure 
audio fast forwarding have only very limited fields of appli- 
cation. 

In the case of digital audio data containing multiple types 
of audio information (e.g. music with speech, pure speech, 
rhythm, musical instruments, etc.), the same browsing tech- 
niques used for video and text data can be applied. According 
to the invention, a user can move the content on different 
modes (text, visual objects, etc.), speed up the browsing of 
these content and come back to the normal audio mode. 

Furthermore, a semantic zoom function for pre -hearing audio 
data by fast navigating through digital audio content is pro- 
posed. For this purpose, some supporting metadata are associ- 
ated to a pure audio stream, e.g. text for the speech parts, 
colors for different characteristics of the music parts (e.g. 
different parts of a musical piece, melody, rhythm and in- 
strumentation of particular motifs, leitmotifs or themes, 
etc.). All this information must contain synchronized data 
with the original audio content. This metadata, embedded in 
the audio stream (e.g. in the MPEG-7 format) or in a separate 
data block (e.g. in the XML format) , can be used for navigat- 
ing, pre-hearing, and making a digest of the audio content. 
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Fig. 17 shows a graphical user interface of a client terminal 
100 6 running an application program for controlling an audio 
player 1706 downloaded from an application server in a cli- 
ent/server-based network environment, said audio player 1706 
5 being capable of executing an audio leafing procedure accord- 
ing to one embodiment of the present invention. The user, 
while listening to the melody of a requested song, can e.g. 
press a fast -forward button, and then e.g. the song's lyrics 
are displayed and scrolled at a fast speed such that the user 
10 can easily follow the fast text scrolling and search for key 
words in order to re- start the listening from another point 
of the audio track. 

In case of pure audio content (such as e.g. a classical sym- 
15 phony) , this process can be done by using different colors 

for indicating the different movements of a symphony or parts 
of a movement (e.g. exposition, development, recapitulation, 
and coda). In this case, a taskbar appears on a user's desk- 
top which indicates different movements or movement parts of 
20 said symphony with the aid of different colors. Textual in- 
formation (e.g. a song's lyrics) can easily and quickly be 
scrolled forward and backward. 
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Depicted Features and their Corresponding 
Reference Signs 



NO . 


Technical Feature (System Component, Procedure Step) 


100a-c 


three diagrams showing different types and speeds of 
leafing through an illustrated magazine, thereby ena- 
bling a reader to focus on specific parts of a page 
and, depending on his/her specific interests, browse 
respectively quicker or slower through the content of 
said magazine 


lOOd 


flow chart illustrating an algorithm which approximates 
the mental process of a person while leafing through an 
illustrated magazine 


200 


diagram illustrating the multimedia decomposition proc- 
ess applied to the content of a video sequence pre- 
sented by a multimedia preview system, 


202a-c 


spatial layouts showing different detail levels of 
presentation for the content of a video sequence to be 
previewed in text and/or image 


300 


diagram showing a schematic example of a spatial layout 
showing different detail levels of presentation for the 
content of a video sequence to be previewed in text 
and/or image 


302 


orocrress bar showina t~ v* nl avi na 1 — i mp o^F -xri Hon o <=» 
quence displayed on the screen or display of said 
user's client terminal as a percentage of the total 
playing time 


304, 
304a-d 


virtual keys for indicating direction and speed for 
playing, forwarding or rewinding said video sequence 


306a-e 


high-resolution large-sized keyframe images showing 
shots from said video sequence 


308a-h 


low-resolution small -si zed keyframe images (so-called 
„ thumb nails ") showing particular shots from said video 
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No. 


Technical Feature (System Component, Procedure Step) 




sequence 


310 


frame for displaying dynamically changing text contain- 
ing dialogs from a video sequence displayed on said 
client terminal 


312 


frame for displaying dynamically changing text contain- 
ing a summary of the underlying story for the video se- 
quence displayed on said client terminal 


314 


video data encompassing video frames from said video 
sequence 


316 


audio data encompassing audio frames (music, sound 
and/or speech) added to said video sequence 


400 


program sequence showing an XML-based representation of 
metadata which is used for browsing the content of mul- 
timedia data to be previewed 


500 


timing diagram of a virtually structured movie 


502a 


keyframe track showing the display times of the large- 
and small-sized images 306a-e and 308a-h, respectively 


502b 


video track showing the playing times of said video 
data 314 


502c 


text track showing the display times of said dialog 
text 310 


502d 


text track showing the display times for the summary 
text 312 of the underlying story 


502e 


audio track showing the playing times of said audio 
data 316 


600a 


diagram showing different input and navigation control 
devices which can be used as human-machine interfaces 
for previewing a video sequence 


600b 


diagram showing an example of browsing through a video 
sequence by using the „ semantic zoom" function offered 
by the video preview system according to one embodiment 
of the present invention 


602a 


three-key touch pad display serving as a human-machine 
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No. 


Technical Feature (System Component, Procedure Step) 




interface for navigating through a list of symbols for 
playing, slow/fast scrolling or skimming a video se- 
quence 


602b 


rolling mouse serving as a human-machine interface for 
performing the aforementioned navigating actions 


602c 


remote control device having control keys for executing 
functions of a video cassette recorder (VCR) . 


700a-d 


four diagrams illustrating the dynamic presentation 
layout during a browsing process 


800a 


bendable PDA, 


800b 


credit -card sized disolav device (..tourist brrow^er^M 
with an integrated navigation system, 


800c-e 


different detail levels of a virtual man rH c-ni ^vpH on 
an inteqrated displav 8 02a (802b) of the bendable PDA 
800a or credit-card sized display device 800b, respec- 
tively 


800f 


rear side of the credit -card sized display device 8 0 0b/ 
which comprises an integrated touch pad 804a 


802a 


rigid display of the bendable PDA 8 0 0a 


802b 


flexible OLED display of the credit-card sized display 
device 8 0 0c, based on organic polymer electronics 


804a 


touch pad of the bendable PDA 800a, which is used for 
steering a cursor displayed on the PDA's display 802a 


804b 


touch pad on the rear side 800f of the credit-card 
sized display device 80 0b, which is used for steering a 
cursor displayed on the display 802b of the credit -card 
sized display device 80 0b 


900 


diagram showing a DVD cover with integrated display ca- 
pabilities which can be used as a human-machine inter- 
face for remotely controlling a video preview system 
according to the present invention 


902 


integrated display of said DVD cover 900, used for 
browsing the content of a video sequence to be pre- 
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No. 


Technical Feature (System Component, Procedure Step) 




viewed in text and/or image 


904 


leafing output data displayed on the integrated display 
902 of said DVD cover 900 


906a+b 


navigation- sensitive areas of said DVD cover 900 


1000 


multimedia preview system in a client/server-based net- 
work environment for browsing the content of requested 
multimedia data to be previewed, 


1002 


multimedia server in said video- on- demand system 1000 
for browsing the content of requested multimedia data 
to be previewed 


1004a 


any data carrier of a file- serving system connected to 
said multimedia server 1002, said f ile-serving system 
storing the multimedia data to be previewed 


1004b 


XML-based representation of metadata associated to the 
content of said multimedia data, used for browsing said 
multimedia data 


1006 


client terminal having a display for previewing said 
multimedia data 


llOOa-c 


three different layouts of the same electronic paper 
(e.g. a Web page), 


1200a-c 


three schematic layouts of an electronic paper consist- 
ing of different frames for displaying pictures, titles 
and body text, 


1202 


frame for displaying pictures referring to text pas- 
sages contained in said Web document 


1204 


frame for displaying titles of text passages contained 
in said of said Web document 


1206 


frame for displaying body text of said Web document 


1300 


seven schematic layouts of a page from an electronic 
paper during a fast browsing, 


1300a-c 


three layouts of said page, wherein the main layout is 
maintained irrespective of a user's speed of browsing 
but during a fast browsing only very important semantic 
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Technical Feature (System Component, Procedure Step) 




objects (pictures, key words, etc.) are left 


1300d-f 


three further layouts of said page, wherein displayed 
objects (pictures and/or text passages) are spatially 
emphasized during a fast browsing 


1300g 


a still further layout of said page, obtained after 
having composed the content of N pages for displaying 
semantically important objects only 


1400 


three examples of input and control devices which can 
be used for creating „ leafing event s u needed for brows- 
ing the content of an electronic paper according to the 
present invention 


1400a 


diagram showing a user's forefinger 1402 navigating 
through electronic documents stored in an electronic 
book device 1405 by touchina various elements of -in- 
tegrated linear-scaled touch- sensitive stripe 1404a 


1400b 


diagram showing a user's forefinger 1402 navigating 
through electronic documents stored in an electronic 
book device 1405 by shiftincr and/or turnincr a cvlind-r-i - 
cal control device 1404b 


1400c 


diagram showing a user's forefincrer 1402 naviaatina ! 
through electronic documents stored in an electronic 
book device 14 05 by moving over a nonlinear- scaled 
touch-sensitive stripe 1404c, 


1402 


a user's forefinger leafing through the pages of e.g. 
an electronic paper 


1404a 


linear-scaled touch-sensitive stripe for navigating 
through the pages of the electronic book device 14 05 


1404b 


cylindrical control device for navigating through the 
pages of the electronic book device 14 05 


1404c 


nonlinear- scaled touch-sensitive stripe for navigating 
through the pages of the electronic book device 14 05 


1405 


electronic book device 


1500 


schematic diagram showing examples of a user's input 
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Technical Feature (System Component, Procedure Step) 




actions for navigating through the pages of an elec- 
tronic book 1600 by moving a finger across a touch- 
sensitive display 1502 to control the speed of browsing 
and/or the detail level of presentation to be displayed 


1502 


touch-sensitive display or any other touch-sensitive 
surface of the electronic book device 1600 


1600 


three-dimensional schematic view of an electronic book 
device having a touch- sensitive display 1502 for input- 
ting control information needed to increase or decrease 
the speed of browsing and/or the detail level of pres- 
entation to be displayed through the pages of the elec- 
tronic book 


1700 


graphical user interface of a client terminal 1006 run- 
ning an application program for controlling an audio 
player 170 6 downloaded from an application server in a 
client /server-based network environment, 


1702 


virtual fast -forward keys, displayed on the screen of 
the client terminal 1006 controlling said audio player 


1704 


display of said client terminal 1006 for displaying 
browsed text passages of a requested song's lyrics 


1706 


audio player, downloaded from an application server 


1800a 


diagram illustrating a speed- dependent spatial - 
t empo ral - semant i c information 1 ayou t 


1800b 


diagram showing a non-linear 'long' displacement action 
with a dynamic spatial -temporal -semantic information 
layout 


SO 


step #0: downloading multimedia data from the multime- 
dia server 1002 to said client terminal 1006 via a net- 
work link 


Sla 


step #la: said multimedia server 1002 receiving user 
commands demanding a change in the speed of browsing 
and/or in the abstraction level of presentation, in the 
following referred to as „ representation parameters " 



P 28643 EP 



35 



No. 


Technical Feature (System Component, Procedure Step) 


Sib 


step #lb: said multimedia server 1002 processing said 
user commands 


S2 


step #2 : decomposing said multimedia data into non- 
redundant and redundant, less relevant parts according 
to an offline image and/or text segmentation algorithm 


S3 


step #3 : adapting said representation parameters by on- 
line filtering out (S3') a certain amount of said re- 
dundant, less relevant parts depending on type and/or 
frequency of said user commands such that the degree of 
presented details is the higher the lower the speed of 
presentation and vice versa 


S4 


step #4: displaying an adapted version of said multime- 
dia data on said client terminal 1006 


S5a 


step #5a: associating metadata of any kind allowing us- 
ers to identify segmented parts of multimedia data to 
be previewed to said multimedia data 


S5b 


step #5b: synchronizing said metadata with said multi- 
media data 


SlOOa 


step #100a: query whether a reader knows enough about a 
specific topic 


SlOOb 


step #100b: coarse-grained leafing, which means leafing 
a document from a Web page, an electronic book or an 
electronic magazine at fast speed and extracting only 
coarse content (e.g. pictures and titles) 


SlOOc 


step #100c: query whether a specific page of said docu- 
ment is selected 


SlOOd | 


step #100d: finer-grained scanning, which means scan- 
ning the content around said pictures or titles for 
getting more details ; 


SlOOe 


step #100e: query whether said reader is further inter- 
ested 


SlOOf 


step #100f : fine-grained reading, which means reading 
the content of said document in detail 
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2003 



Claims 



on a 



1. A multimedia preview system in a client/server-based net- 
work environment for browsing the content of requested multi- 
media data to be previewed, said content being displayed on a 
client terminal (1006) accessing a multimedia server (1002) 
which holds said multimedia data, 
characterized by 

controlling means (602a-c / 800a/b, 900, 1404a-c, 1600) for 
adapting the speed of browsing and/or the abstraction level 
of presentation in text and/or image depending on type and/or 
frequency of user commands instructing the multimedia preview 
system (100 0) to browse either quicker or slower through the 
content of said multimedia data such that the degree of pre- 
sented details is the higher the lower the speed of presenta- 
tion and vice versa. 

2, A multimedia preview system according to claim 1, 
characterized in that 

said multimedia preview system (100 0) is realized as a video- 
on-demand system with an additional video browsing function- 
ality for varying the speed and detail level of presentation 
depending on type and/or frequency of user commands instruct- 
ing the multimedia preview system (1000) change the speed of 
browsing such that said detail level is the higher the lower 
the speed of presentation and vice versa. 

3. A multimedia preview system according to claims 1 or 2 , 
characterized in that ■ . 

said controlling means (602a-c, 800a/b, 900, 1404a-c, 1600) 
comprises a touch- sensitive display (1502) for navigating 
through the multimedia data to be previewed, 



4. A method for browsing the content of multimedia data to be 
previewed, said content being displayed on a client terminal 
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(1006) accessing a multimedia server (1002) which holds said 
multimedia data, 
characterized by the steps of 

— downloading (SO) said multimedia data from the multimedia 

5 server (1002) to said client terminal (1006) via a network 

link, 

— said multimedia server (1002) receiving (Sla) and process- 
ing (Sib) user commands demanding a change in the speed of 
browsing and/or in the abstraction level of presentation, 

10 in the following referred to as „ representation parame- 

ters'" , 

— decomposing (S2) said multimedia data into non-redundant 
and redundant, less relevant parts, 

— adapting (S3) said representation parameters by online fil- 
ls tering out (S3') a certain amount of said redundant, less 

relevant parts depending on type and/or frequency of said 
user commands such that the degree of presented details is 
the higher the lower the speed of presentation and vice 
versa, and 

20 — displaying (S4) an adapted version of said multimedia data 
on said client terminal (1006) . 

5. A method according to claim 4, 
characterized by the steps of 

25 — associating (S5a) metadata of any kind allowing users to 

identify segmented parts of multimedia data to be previewed 
to said multimedia data and 

— synchronizing (S5b) said metadata with said multimedia 
data . 

30 

6. A method according to anyone of the claims 4 or 5, 
characterized by 

said user commands being movements of a user's finger across 
a touch-sensitive display (1502) according to claim 3, the 
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length of the movement path being directly proportional to 
the speed of browsing and/or the detail level of presentation 
when displaying said multimedia data. 

5 7. A method according to anyone of the claims 4 or 5, 
characterized by 

said user commands being forces exerted by a user's finger to 
the surface of a touch- sensitive display (1502) according to 
claim 3, said force being directly proportional to the speed 
10 of browsing and/or the detail level of presentation when dis- 
playing said multimedia data. 

8. A method according to anyone of the claims 4 or 5 , 
characterized by 
15 said user commands being the duration of forces exerted by a 
user's finger to the surface of a touch-sensitive display 
(1502) according to claim 3, said duration being directly 
proportional to the speed of browsing and/or the detail level 
of presentation when displaying said multimedia data. 
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Abstract 

The present invention generally relates to a system (1000) 
for efficiently previewing multimedia content (video, audio, 
and/or text data) . In this connection, a conceptual framework 
is introduced that defines basic mechanisms for simulating a 
„ leafing through" of digital content (e.g. the content of an 
electronic book, a digital video or audio file) , comparable 
to the process of leafing through the pages of a book, maga- 
zine or newspaper. 

According to a method for browsing the content of requested 
multimedia data to be previewed, said content being displayed 
on a client terminal (1006) accessing a multimedia server 

(10 02) which holds said multimedia data. After having down- 
loaded (SO) said multimedia data from the multimedia server 

(1002) to said client terminal (1006) via a network link, 
said multimedia server (1002) receives (Sla) and processes 

(Sib) user commands demanding a change in the speed of brows- 
ing and/or in the abstraction level of presentation, in the 
following referred to as „ representation parameters u . After 
that, said multimedia data are decomposed (S2) into non- 
redundant and redundant, less relevant parts according to an 
offline image and/or text segmentation algorithm. These rep- 
resentation parameters are then adapted (S3) by online fil- 
tering out (S3 7 ) a certain amount of said redundant, less 
relevant parts depending on type and/or frequency of said 
user commands such that the degree of presented details is 
the higher the lower the speed of presentation and vice 
versa. Finally, an adapted version of said multimedia data is 
displayed (S4) on said client terminal (1006) . 
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